procruste problem
Orthogonal Procrustes problem preserves correlations in synthetic data
Ounissi, Oussama, Jävergård, Nicklas, Muntean, Adrian
This work introduces the application of the Orthogonal Procrustes problem to the generation of synthetic data. The proposed methodology ensures that the resulting synthetic data preserves important statistical relationships among features, specifically the Pearson correlation. An empirical illustration using a large, real-world, tabular dataset of energy consumption demonstrates the effectiveness of the approach and highlights its potential for application in practical synthetic data generation. Our approach is not meant to replace existing generative models, but rather as a lightweight post-processing step that enforces exact Pearson correlation to an already generated synthetic dataset.
- Europe > Sweden > Värmland County > Karlstad (0.05)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Colorado (0.04)
Identifying Large-Scale Linear Parameter Varying Systems with Dynamic Mode Decomposition Methods
Jordanou, Jean Panaioti, Camponogara, Eduardo, Gildin, Eduardo
Linear Parameter Varying (LPV) Systems are a well-established class of nonlinear systems with a rich theory for stability analysis, control, and analytical response finding, among other aspects. Although there are works on data-driven identification of such systems, the literature is quite scarce in terms of works that tackle the identification of LPV models for large-scale systems. Since large-scale systems are ubiquitous in practice, this work develops a methodology for the local and global identification of large-scale LPV systems based on nonintrusive reduced-order modeling. The developed method is coined as DMD-LPV for being inspired in the Dynamic Mode Decomposition (DMD). To validate the proposed identification method, we identify a system described by a discretized linear diffusion equation, with the diffusion gain defined by a polynomial over a parameter. The experiments show that the proposed method can easily identify a reduced-order LPV model of a given large-scale system without the need to perform identification in the full-order dimension, and with almost no performance decay over performing a reduction, given that the model structure is well-established.
- South America > Brazil (0.28)
- North America > United States (0.28)
Subsampling, aligning, and averaging to find circular coordinates in recurrent time series
Blumberg, Andrew J., Carrière, Mathieu, Fung, Jun Hou, Mandell, Michael A.
We introduce a new algorithm for finding robust circular coordinates on data that is expected to exhibit recurrence, such as that which appears in neuronal recordings of C. elegans. Techniques exist to create circular coordinates on a simplicial complex from a dimension 1 cohomology class, and these can be applied to the Rips complex of a dataset when it has a prominent class in its dimension 1 cohomology. However, it is known this approach is extremely sensitive to uneven sampling density. Our algorithm comes with a new method to correct for uneven sampling density, adapting our prior work on averaging coordinates in manifold learning. We use rejection sampling to correct for inhomogeneous sampling and then apply Procrustes matching to align and average the subsamples. In addition to providing a more robust coordinate than other approaches, this subsampling and averaging approach has better efficiency. We validate our technique on both synthetic data sets and neuronal activity recordings. Our results reveal a topological model of neuronal trajectories for C. elegans that is constructed from loops in which different regions of the brain state space can be mapped to specific and interpretable macroscopic behaviors in the worm.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Indiana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Resampling and averaging coordinates on data
Blumberg, Andrew J., Carriere, Mathieu, Fung, Jun Hou, Mandell, Michael A.
We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis.
- North America > United States > Indiana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Europe > France (0.04)
- Research Report (0.50)
- Workflow (0.48)
Learning Explicitly Conditioned Sparsifying Transforms
Pătraşcu, Andrei, Rusu, Cristian, Irofti, Paul
Sparsifying transforms became in the last decades widely known tools for finding structured sparse representations of signals in certain transform domains. Despite the popularity of classical transforms such as DCT and Wavelet, learning optimal transforms that guarantee good representations of data into the sparse domain has been recently analyzed in a series of papers. Typically, the conditioning number and representation ability are complementary key features of learning square transforms that may not be explicitly controlled in a given optimization model. Unlike the existing approaches from the literature, in our paper, we consider a new sparsifying transform model that enforces explicit control over the data representation quality and the condition number of the learned transforms. We confirm through numerical experiments that our model presents better numerical behavior than the state-of-the-art.
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- North America > United States > New York (0.04)
Symmetrized Robust Procrustes: Constant-Factor Approximation and Exact Recovery
Amir, Tal, Kovalsky, Shahar, Dym, Nadav
The classical $\textit{Procrustes}$ problem is to find a rigid motion (orthogonal transformation and translation) that best aligns two given point-sets in the least-squares sense. The $\textit{Robust Procrustes}$ problem is an important variant, in which a power-1 objective is used instead of least squares to improve robustness to outliers. While the optimal solution of the least-squares problem can be easily computed in closed form, dating back to Sch\"onemann (1966), no such solution is known for the power-1 problem. In this paper we propose a novel convex relaxation for the Robust Procrustes problem. Our relaxation enjoys several theoretical and practical advantages: Theoretically, we prove that our method provides a $\sqrt{2}$-factor approximation to the Robust Procrustes problem, and that, under appropriate assumptions, it exactly recovers the true rigid motion from point correspondences contaminated by outliers. In practice, we find in numerical experiments on both synthetic and real robust Procrustes problems, that our method performs similarly to the standard Iteratively Reweighted Least Squares (IRLS). However the convexity of our algorithm allows incorporating additional convex penalties, which are not readily amenable to IRLS. This turns out to be a substantial advantage, leading to improved results in high-dimensional problems, including non-rigid shape alignment and semi-supervised interlingual word translation.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > North Carolina (0.04)
- (4 more...)
Automatic Registration and Convex Clustering of Time Series
Weylandt, Michael, Michailidis, George
Clustering of time series data exhibits a number of challenges not present in other settings, notably the problem of registration (alignment) of observed signals. Typical approaches include pre-registration to a user-specified template or time warping approaches which attempt to optimally align series with a minimum of distortion. For many signals obtained from recording or sensing devices, these methods may be unsuitable as a template signal is not available for pre-registration, while the distortion of warping approaches may obscure meaningful temporal information. We propose a new method for automatic time series alignment within a convex clustering problem. Our approach, Temporal Registration using Optimal Unitary Transformations (TROUT), is based on a novel distance metric between time series that is easy to compute and automatically identifies optimal alignment between pairs of time series. By embedding our new metric in a convex formulation, we retain well-known advantages of computational and statistical performance. We provide an efficient algorithm for TROUT-based clustering and demonstrate its superior performance over a range of competitors.
Two Way Adversarial Unsupervised Word Translation
Word translation is a problem in machine translation that seeks to build models that recover word level correspondence between languages. Recent approaches to this problem have shown that word translation models can learned with very small seeding dictionaries, and even without any starting supervision. In this paper we propose a method to jointly find translations between a pair of languages. Not only does our method learn translations in both directions but it improves accuracy of those translations over past methods.
Unsupervised Hierarchy Matching with Optimal Transport over Hyperbolic Spaces
Alvarez-Melis, David, Mroueh, Youssef, Jaakkola, Tommi S.
This paper focuses on the problem of unsupervised alignment of hierarchical data such as ontologies or lexical databases. This is a problem that appears across areas, from natural language processing to bioinformatics, and is typically solved by appeal to outside knowledge bases and label-textual similarity. In contrast, we approach the problem from a purely geometric perspective: given only a vector-space representation of the items in the two hierarchies, we seek to infer correspondences across them. Our work derives from and interweaves hyperbolic-space representations for hierarchical data, on one hand, and unsupervised word-alignment methods, on the other. We first provide a set of negative results showing how and why Euclidean methods fail in this hyperbolic setting. We then propose a novel approach based on optimal transport over hyperbolic spaces, and show that it outperforms standard embedding alignment techniques in various experiments on cross-lingual WordNet alignment and ontology matching tasks.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Italy (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.77)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)
Towards Optimal Transport with Global Invariances
Alvarez-Melis, David, Jegelka, Stefanie, Jaakkola, Tommi S.
Many problems in machine learning involve calculating correspondences between sets of objects, such as point clouds or images. Discrete optimal transport (OT) provides a natural and successful approach to such tasks whenever the two sets of objects can be represented in the same space or when we can evaluate distances between the objects. Unfortunately neither requirement is likely to hold when object representations are learned from data. Indeed, automatically derived representations such as word embeddings are typically fixed only up to some global transformations, for example, reflection or rotation. As a result, pairwise distances across the two types of objects are ill-defined without specifying their relative transformation. In this work, we propose a general framework for optimal transport in the presence of latent global transformations. We discuss algorithms for the specific case of orthonormal transformations, and show promising results in unsupervised word alignment.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)